YT Assignment MC2

A short description of the post.

HONG Yun Ting true
07-01-2021

Overview

GAStech is a company that is located in a country island of Kronos and it has come to their attention that some of the employees had mysteriously went missing. Vehicles tracking data that was secretly installed in the company’s cars and Kronos-Kares benefit card information are delivered to authorities for investigation.

Literature review of existing analysis performed

3. Extracting, wrangling and preparing the input data

3.1 Setting up environment

packages = c('tidyverse', 'lubridate', 'dplyr', 'raster', 'clock', 'sf', 'tmap', 'plotly', 'ggplot2', 'mapview', 'rgdal',
             'rgeos', 'sqldf')

for (p in packages) {
  if (!require(p, character.only = T)) {
    install.packages(p, repos = "http://cran.us.r-project.org")
  }
  library(p, character.only = T)
}

Importing Data

# A tibble: 44 x 5
   LastName    FirstName CarID CurrentEmploymentT… CurrentEmploymentT…
   <chr>       <chr>     <dbl> <chr>               <chr>              
 1 Calixto     Nils          1 Information Techno… IT Helpdesk        
 2 Azada       Lars          2 Engineering         Engineer           
 3 Balas       Felix         3 Engineering         Engineer           
 4 Barranco    Ingrid        4 Executive           SVP/CFO            
 5 Baza        Isak          5 Information Techno… IT Technician      
 6 Bergen      Linnea        6 Information Techno… IT Group Manager   
 7 Orilla      Elsa          7 Engineering         Drill Technician   
 8 Alcazar     Lucas         8 Information Techno… IT Technician      
 9 Cazar       Gustav        9 Engineering         Drill Technician   
10 Campo-Corr… Ada          10 Executive           SVP/CIO            
# … with 34 more rows
ccData <- read_csv("MC2/cc_data.csv") %>%
  mutate(Date = as.Date(mdy_hm(timestamp)), Time = format(mdy_hm(timestamp), "%H:%M"))

loyaltyData <- read_csv("MC2/loyalty_data.csv") %>%
  mutate(Date = as.Date(mdy(timestamp)))

ccLoyalty <- left_join(ccData, loyaltyData, by = c("Date", "location", "price")) %>%
  dplyr::select(Date, Time, location, price, last4ccnum, loyaltynum) %>%
  group_by(last4ccnum)

ccLoyalty$weekday = wday(ccLoyalty$Date, label = TRUE, abbr = TRUE) 

ccLoyalty
# A tibble: 1,496 x 7
# Groups:   last4ccnum [55]
   Date       Time  location       price last4ccnum loyaltynum weekday
   <date>     <chr> <chr>          <dbl>      <dbl> <chr>      <ord>  
 1 2014-01-06 07:28 Brew've Been … 11.3        4795 L8566      Mon    
 2 2014-01-06 07:34 Hallowed Grou… 52.2        7108 <NA>       Mon    
 3 2014-01-06 07:35 Brew've Been …  8.33       6816 L8148      Mon    
 4 2014-01-06 07:36 Hallowed Grou… 16.7        9617 L5553      Mon    
 5 2014-01-06 07:37 Brew've Been …  4.24       7384 L3800      Mon    
 6 2014-01-06 07:38 Brew've Been …  4.17       5368 L2247      Mon    
 7 2014-01-06 07:42 Coffee Camele… 28.7        7253 <NA>       Mon    
 8 2014-01-06 07:43 Brew've Been …  9.6        4948 L9406      Mon    
 9 2014-01-06 07:43 Brew've Been … 16.9        9683 L7291      Mon    
10 2014-01-06 07:47 Hallowed Grou… 16.5        8129 L8328      Mon    
# … with 1,486 more rows
# A tibble: 9 x 7
# Groups:   last4ccnum [9]
  Date       Time  location        price last4ccnum loyaltynum weekday
  <date>     <chr> <chr>           <dbl>      <dbl> <chr>      <ord>  
1 2014-01-06 07:28 Brew've Been S… 11.3        4795 L8566      Mon    
2 2014-01-06 07:35 Brew've Been S…  8.33       6816 L8148      Mon    
3 2014-01-06 07:37 Brew've Been S…  4.24       7384 L3800      Mon    
4 2014-01-06 07:38 Brew've Been S…  4.17       5368 L2247      Mon    
5 2014-01-06 07:43 Brew've Been S…  9.6        4948 L9406      Mon    
6 2014-01-06 07:43 Brew've Been S… 16.9        9683 L7291      Mon    
7 2014-01-06 07:52 Brew've Been S… 32.8        9405 <NA>       Mon    
8 2014-01-06 07:55 Brew've Been S…  3.67       3484 L2490      Mon    
9 2014-01-06 07:59 Brew've Been S…  8.88       7819 L5259      Mon    
# A tibble: 685,169 x 6
   Timestamp           id      lat  long date       Time 
   <dttm>              <fct> <dbl> <dbl> <date>     <chr>
 1 2014-01-06 06:28:01 35     36.1  24.9 2014-01-06 06:28
 2 2014-01-06 06:28:01 35     36.1  24.9 2014-01-06 06:28
 3 2014-01-06 06:28:03 35     36.1  24.9 2014-01-06 06:28
 4 2014-01-06 06:28:05 35     36.1  24.9 2014-01-06 06:28
 5 2014-01-06 06:28:06 35     36.1  24.9 2014-01-06 06:28
 6 2014-01-06 06:28:07 35     36.1  24.9 2014-01-06 06:28
 7 2014-01-06 06:28:09 35     36.1  24.9 2014-01-06 06:28
 8 2014-01-06 06:28:10 35     36.1  24.9 2014-01-06 06:28
 9 2014-01-06 06:28:11 35     36.1  24.9 2014-01-06 06:28
10 2014-01-06 06:28:12 35     36.1  24.9 2014-01-06 06:28
# … with 685,159 more rows

Filtering variables

These variables can be changed to a different car ID (Car 1-35) and GPS date(2014-01-06 to 2014-01-19)

Car coordinates when vehicle stopped

I am eliminating coordinates that indicating that the car is moving The GPS car coordinates are recorded every 1-5 secs. Therefore, if there is a GPS record difference of more than a min, which means the employee has driven the car to a destination

  1. I am getting the first and last car coordinate each day
  2. Getting the car stopped locations through the days
# A tibble: 113 x 10
# Groups:   date [14]
   Timestamp           id      lat  long date       Time  diff   count
   <dttm>              <fct> <dbl> <dbl> <date>     <chr> <drtn> <int>
 1 2014-01-06 07:20:01 1      36.1  24.9 2014-01-06 07:20   0.0…     1
 2 2014-01-06 07:22:04 1      36.1  24.9 2014-01-06 07:22  34.9…    79
 3 2014-01-06 08:04:09 1      36.0  24.9 2014-01-06 08:04 252.8…   405
 4 2014-01-06 12:26:27 1      36.1  24.9 2014-01-06 12:26  58.5…   854
 5 2014-01-06 13:34:27 1      36.0  24.9 2014-01-06 13:34 249.5…  1307
 6 2014-01-06 17:48:03 1      36.1  24.9 2014-01-06 17:48 107.9…  1484
 7 2014-01-06 19:42:27 1      36.1  24.9 2014-01-06 19:42   6.5…  1787
 8 2014-01-06 19:49:01 1      36.1  24.9 2014-01-06 19:49  38.0…  1788
 9 2014-01-06 20:33:30 1      36.1  24.9 2014-01-06 20:33  97.5…  2083
10 2014-01-06 22:15:02 1      36.0  24.9 2014-01-06 22:15  45.9…  2262
# … with 103 more rows, and 2 more variables: FIRST <lgl>, LAST <lgl>

Car coordinates when vehicle stopped

I am eliminating coordinates that indicating that the car is moving The GPS car coordinates are recorded every 1-5 secs. Therefore, if there is a GPS record difference of more than a min, which means the employee has driven the car to a destination

  1. I am getting the first and last car coordinate each day
  2. Getting the car stopped locations through the days
# A tibble: 95 x 10
# Groups:   date [14]
   Timestamp           id      lat  long date       Time  diff   count
   <dttm>              <fct> <dbl> <dbl> <date>     <chr> <drtn> <int>
 1 2014-01-06 07:30:01 2      36.1  24.9 2014-01-06 07:30   0.0…     1
 2 2014-01-06 07:34:22 2      36.1  24.9 2014-01-06 07:34  36.6…   144
 3 2014-01-06 08:31:11 2      36.0  24.9 2014-01-06 08:31 225.8…  1043
 4 2014-01-06 12:20:42 2      36.1  24.9 2014-01-06 12:20  65.3…  1204
 5 2014-01-06 13:29:42 2      36.0  24.9 2014-01-06 13:29 227.3…  1356
 6 2014-01-06 17:27:28 2      36.1  24.9 2014-01-06 17:27  87.5…  1839
 7 2014-01-06 18:57:11 2      36.1  24.9 2014-01-06 18:57   2.8…  1931
 8 2014-01-06 19:00:01 2      36.1  24.9 2014-01-06 19:00 109.0…  1932
 9 2014-01-06 20:52:01 2      36.1  24.9 2014-01-06 20:52   0.0…  2009
10 2014-01-07 08:02:01 2      36.1  24.9 2014-01-07 08:02   0.0…     1
# … with 85 more rows, and 2 more variables: FIRST <lgl>, LAST <lgl>

class      : RasterLayer 
band       : 1  (of  3  bands)
dimensions : 1618, 2716, 4394488  (nrow, ncol, ncell)
resolution : 3.148491e-05, 3.148491e-05  (x, y)
extent     : 24.82429, 24.9098, 36.04476, 36.0957  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : /Users/yuntinghong/Documents/SMU/ISSS608 - Visual Analytics/hongyunting/ISSS608Project/_posts/2021-07-04-yt-assignment-mc2/MC2/Geospatial/MC2-tourist_modified.tif 
names      : MC2.tourist_modified 
values     : 0, 255  (min, max)

OGR data source with driver: ESRI Shapefile 
Source: "/Users/yuntinghong/Documents/SMU/ISSS608 - Visual Analytics/hongyunting/ISSS608Project/_posts/2021-07-04-yt-assignment-mc2/MC2/Geospatial", layer: "Abila"
with 3290 features
It has 9 fields
Integer64 fields read as strings:  TLID 
class       : SpatialLinesDataFrame 
features    : 3290 
extent      : 24.82401, 24.90997, 36.04501, 36.09492  (xmin, xmax, ymin, ymax)
crs         : +proj=longlat +datum=WGS84 +no_defs 
variables   : 9
names       :      TLID, FEDIRP,  FENAME, FETYPE, FEDIRS, FRADDL, TOADDL, FRADDR, TOADDR 
min values  : 184619449,      N,   Acera,    Ave,      N,      1,     20,      1,     17 
max values  : 184714869,      S, Zefirou,    Way,      N,   7400,   7498,   7401,   7499 

Map view of Abila, Kronos’s route and Employees’ whereabout

   id                       desc
1   1       Calixto Nils's house
2   2         Azada Lars's house
3   3        Balas Felix's house
4   4    Barranco Ingrid's house
5   5          Baza Isak's house
6   6      Bergen Linnea's house
7   7        Orilla Elsa's house
8   8      Alcazar Lucas's house
9   9       Cazar Gustav's house
10 10 Campo-Corrente Ada's house
11 11        Calzas Axel's house
12 12    Cocinaro Hideki's house
13 13         Ferro Inga's house
14 14      Dedos Lidelse's house
15 15     Bodrogi Loreto's house
16 16          Vann Isia's house
17 17        Flecha Sven's house
18 18    Frente Birgitta's house
19 19        Frente Vira's house
20 20       Fusil Stenig's house
21 21     Osvaldo Hennie's house
22 22      Nubarron Adra's house
23 23        Lagos Varja's house
24 24         Mies Minke's house
25 25      Herrero Kanon's house
26 26         Onda Marin's house
27 27        Orilla Kare's house
28 28    Borrasca Isande's house
29 29      Ovan Bertrand's house
30 30      Resumir Felix's house
31 31  Sanjorge Jr. Sten's house
32 32        Strum Orhan's house
33 33    Tempestad Brand's house
34 34        Vann Edvard's house
35 35  Vasco-Pais Willem's house